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Analysis of Sum- Weight-like algorithms for 
averaging in Wireless Sensor Networks 

Franck Iutzeler, Philippe Ciblat and Walid Hachem 
Abstract 

Distributed estimation of the average value over a Wireless Sensor Network has recently received a 
lot of attention. Most papers consider single variable sensors and communications with feedback (e.g. 
peer-to-peer communications). However, in order to use efficiently the broadcast nature of the wireless 
channel, communications without feedback are advocated. To ensure the convergence in this feedback- 
free case, the recently-introduced Sum- Weight-like algorithms which rely on two variables at each sensor 
are a promising solution. In this paper, the convergence towards the consensus over the average of the 
initial values is analyzed in depth. Furthermore, it is shown that the squared error decreases exponentially 
with the time. In addition, a powerful algorithm relying on the Sum- Weight structure and taking into 
account the broadcast nature of the channel is proposed. 

I. Introduction 

The recent years have seen a surge of signal processing and estimation technologies operating in 
stressful environments. These environments do not make possible the use of a fusion center so the 
units/sensors have to behave in a distributed fashion. In various applications, sensors have to communicate 
through wireless channels because of the lack of infrastructure. Hence, along with distributed computation, 
the problem of communicating between the different sensors to estimate a global value is a key issue 
and was pioneered by Tsitsiklis 0. 

One of the most studied problems in Wireless Sensor Networks is the average computation of the 
initial measurements of the sensors. More precisely, each sensor wants to reach consensus over the mean 
of the initial values. A basic technique to address this problem, called Random Gossip, is to make the 
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sensors exchange their estimates in pairs and average them. This technique has been widely analyzed in 
terms of convergence and convergence speed in [31, |4l, & 

Finding more efficient exchange protocols has been a hot topic for the past few years; the proposed 
improvements were essentially twofold: i) exploiting the geometry of the network to have a more efficient 
mixing between the values (e.g. 0, Q, (HI) and ii) taking advantage of the broadcast nature of the 
wireless channels (e.g. O without feedback link, and ifTOll with feedback link). Whereas the use of 
network geometry has received a lot of attention, the use of the broadcast nature of the wireless channel 
is less studied albeit promising. Therefore, in our paper, we will focus on averaging algorithms taking 
into account the broadcast nature of the channel. In order to keep the number of communications as low 
as possible, we forbid the use of feedback links. 

In the feedback-free context, one can mention [9]. However, even if the algorithm described in Q 
converges quickly to a consensus, the reached value is incorrect. This can be explained by the fact that 
the sum of the sensor estimates is not constant over time. To overcome this problem, Franceschelli et 
al. ifTTI proposed to use well-chosen updates on two local variables per sensor while using the broadcast 
nature of the channel without feedback link. A more promising alternative is to use the Sum-Weight 
scheme proposed by Kempe lfT2l and studied more generally by Benezit ||T3l . In this setup, two local 
variables are also used: one representing the sum of the received values and the other representing the 
weight of the sensor (namely, the proportion of the sensor activity compared to the others). The two 
variables are transmitted at each iteration and both are updated in the same manner. The wanted estimate 
is then the quotient of these values. The convergence of this class of algorithms (without necessarily 
sum-conservation) has been proven in |[T2l . |[T3l . In contrast, their convergence speed has never been 
theoretically evaluated except in |[T2l for a very specific case. 

The goal of this paper is to theoretically analyze the convergence speed for any Sum-Weight-like 
algorithm. As a by-product, we obtain necessary and sufficient condition for the convergence. In addition, 
we propose a new Sum-Weight-like algorithm based on broadcasting which outperforms existing ones. 

This paper is organized as follows: the notations and assumptions on the network model and on the 
Sum-Weight-like algorithms are provided in Section JI] Section HIT] is dedicated to the theoretical analysis 
of the squared error of the algorithms and provides the main contributions of the paper. In Section 
UVl we propose new Sum-Weight-like algorithms. In Section |VJ we compare our results with previous 
derivations done in the literature for the Sum-Weight-like algorithms as well as the algorithms based on 
the exchange of a single variable between the nodes. Section [VT] is devoted to numerical illustrations. 
Concluding remarks are drawn in Section IVIII 
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II. Model and Assumptions 

A. Network model 

The sensor network will be modeled by a directed graph Q = (V, E), V being the set of vertices/sensors 
and E being the set of edges which models the possible links between the sensors. We also define the 
adjacency matrix A of Q as the N x N matrix such that (A)^ equals 1 if there is an edge from i to 
j and otherwise. We define the neighborhood of each sensor i as follows Mi = {j G V\(i,j) G E}. 
Let di = \Mi\ denote the degree of the sensor i where \A\ represents the cardinality of the set A. Let 
dmax = maxjdj be the maximum degree. Let D = diag(c£i, • • • , djv) and L = D — A be the degree 
matrix and the Laplacian matrix respectively |[T4l . 

Every sensor i has an initial value Xj(0) and we define x(0) = [xi(0), xat(0)] t where the super- 
script T stands for the transposition. The goal of the network is to communicate through the edges of the 
underlying graph to reach consensus over the mean of the initial values of the sensors. A communication 
and estimation step will be referred to as an update. 

We will assume that the network follows a discrete time model such that the time t is the time of the 
i-th update. As an example, every sensor could be activated by an independent Poisson clock. The time 
would then be counted as the total number of clock ticks across the network. We will denote Xi(t) the 
i-th sensor estimate at time t and x(t) = [a?i(t), X7v(i)] T . 

B. Averaging Algorithms 

The goal of averaging algorithms is to make the vector of estimates x(f) converge to x ave l, also 
known as the consensus vector, where 1 is the length-iV vector of ones and x ave = (l/./V)l T x(0) is the 
average of the initial values of the sensors. In the present state-of-the-art, two classes of algorithms exist 
and are described below. 

1) Class of Random Gossip algorithms: In standard gossip algorithms (e.g. flU), sensors update their 
estimate according to the equation x(i+l) T = x(£) T K(i) where the K(t) are doubly-stochastic matrices. 
We recall that a matrix K is said row-stochastic (resp. column-stochastic) when all its elements are non- 
negative and when Kl = 1 (resp. K T 1 = 1). A matrix which is both row- and column-stochastic is said 
to be doubly-stochastic. Since two sensors can exchange information only across the edges of the graph, 
for any i ^ j, (K(i))y cannot be positive if (A)y = 0. From an algorithmic point of view, the row- 
stochasticity implies that the sum of the values is unchanged: x(£+l) T l = x(£) T K(i)l = x(i) T l whereas 
the column-stochasticity implies that the consensus is stable: if x(t) = cl, then x(t + l) T = x(t) T K(i) = 
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cl T K(t) = cl T . For these reasons, double-stochasticity is desirable. However using doubly-stochastic 
matrices implies a feedback which is not always possible. In particular if a sensor sends information to 
multiple neighbors, the feedback message might raise multiple access problems. Similarly, if the message 
is sent through a long route within the network, the same route may not exist anymore for feedback in 
the context of mobile wireless networks. As these algorithms only rely on the exchanges of one variable 
per sensor, they will be called single-variate algorithms in the rest of the paper. 

2) Class of Sum-Weight algorithms: To overcome this drawback, a possible method is to use two 
variables : one representing the sum of the received values and another representing the relative weight 
of the sensor. For the sensor i at time t, they be respectively written Sj(i) and Wi(t). Writing s(t) = 
[si(t), S7v(i)] T and w(i) = [w\(t), WN(t)] T , both variables will be modified by the same update 
matrix, s(t + 1) T = s(i) T K(i) and w(t + 1) T = w(t) T K(t). Finally, the estimate of sensor i at time t 
will be the quotient of the two variables, Xi(t) = Si(t)/wi(t). The initialization is done as follows: 

s(0) = x(0) 

(1) 

w(0) = 1. 

For the sake of convergence we will need an important property: Mass Conservation 

EliSi(t) = lZiXi(0) = Nx a 



u ave 



(2) 



This clearly rewrites as Vt > 0, K(i)l = 1 which corresponds to sum-conservation as in classic gossip 
algorithms and leads to row-stochastic updates matrices. 



C. Notations for the Sum-Weight scheme 

Let us now introduce some useful notations along with some fundamental assumptions for convergence 
in the Sum- Weight scheme. Given two vectors a and b with the same size, we denote by a/b the vector 
of the elementwise quotients. The Sum- Weight algorithm is described by the following equations: 

r-j-i 

si(t) s N {t) 

Witt)''"' W N (t)_ 

' s T (t + 1) = s T (t)K(t) = x T (0)P(i) 
w T (i + 1) = w T (t)K(t) = l T P(t) 

with P(t) = K(1)K(2) . . . K(t). 

In the following, the matrix inequalities will be taken elementwise so that M > (resp. M > 0) 
means that the matrix M is (elementwise) positive (resp. non-negative). We recall that a non-negative 
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matrix M is said to be primitive if M m > for some m > 1 (see lfl"5l Chap 8.5] for details). We will 
denote the Kronecker product by '<£>'. 

We can notice that reaching consensus is equivalent for x(t) to converge to the consensus line cl where 
c is consensus value. For this reason, it is useful to define J = (l/iV)ll T the orthogonal projection matrix 
to the subspace spanned by 1 and (I — J) the orthogonal projection matrix to the complementary subspace 
which can be seen as the error hyperplane. The matrix I is the identity matrix with appropriate size. 

In order to intuitively understand the algorithm behavior, let us decompose x T (t) as follows 



T W £®_ = X T (0)P(t) 

{) w T (t) w T (£) 

x T (0)JP(t) x T (0)(I-J)P(t) 

w T (t) w T (t) 
x ave l T P(t) x T (0)(I- J)P(t) 
l T P{t) w T (t) 

_ T xT(Q)(I-J)P( t ) 

Obviously, the algorithm will converge to the right consensus if the second term in the right hand side 
vanishes. Actually, under some mild assumptions related to the connectedness of the network, we expect 
the numerator which corresponds to a projection on the error hyperplane will converge to zero at an 
exponential rate while all the elements of w(t) are of order one. Proving these results will be the core 
of the paper. 



D. Assumptions on the update matrices K(i) 

First, we will always assume that both following conditions will be satisfied by any update matrix 
associated with a Sum-Weight like algorithm. 

(Al) Matrices (K(t)) t>0 are independent and identically distributed (i.i.d.), and row-stochastic. The 
matrix K(t) is valued in a set K, = {Kj} i=1 M of size M < oo. Also, pi = P[K(i) = Kj] > 0. 
(A2) Any matrix in K has a strictly positive diagonal. 

The first assumption is just a reformulation of the mass conservation property introduced in section III-B2I 
along with the assumption of a finite number of actions across the network. This assumption is reasonable 
when one assumes that each sensor performs a finite number of actions. The second assumption forces 
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every sensor to keep a part of the information it had previously. We also define 

m K = min i)i)i: |(K fe )^ : (K fe )^ > Cjj , 

pic = min fc {P \K(t) = K fc ]} = mm k p k > 0. 

In addition to both previous assumptions, we will see that next assumption plays a central role in the 
convergence analysis of any Sum-Weight like algorithm. 

(B) E[K] = Yli=iPiK-i i s a primitive matrix. 

In terms of graph theory, matrix E[K] represents a weighted directed graph (see Ifl5l Def. 6.2.11]). 
Since it is primitive, this graph is strongly connected (see lfT5l Cor. 6.2.18] and EOlO . Observe that this 
graph contains a self-loop at every node due to Assumption (A2). In fact, the matrix A + I coincides 
with the so-called indicator matrix ( lfT31 Def. 6.2.10]) of E[K]. 

III. Mathematical results 

A. Preliminary results 

The assumption (B) can be re-written in different ways thanks to the next Lemma. 

Lemma 1. Under assumptions (Al) and (A2), the following propositions are equivalent to (B) : 

(Bl) V(i,j) £ {1, ...,N} 2 , 3Lij < N and a realization ofP(Lij) verifying P(L,ij)ij > 0. 
(B2) 3L < 2N 2 and a realization ofP(L) which is a positive matrix. 
(B3) E[K ®K] = Yli=iPiK-i ® K« is a primitive matrix. 

The proof is reported in Appendix |A) This Lemma will be very useful in the sequel since it enables 
us to interpret the Assumption (B) in various manners. 

Our approach for analyzing the convergence of Sum-Weight algorithms is inspired by |[T2l (with a 
number of important differences explained below) and so relies on the analysis of the Squared Error 
(SE). Actually, the Squared Error can be upper-bounded by a product of two terms as follows 

N N 
|xm — X ave l| 2 = / , FiW — •Eavel = / . TTTo — x ave w i{t)\ (5) 



Y — 

U Mt) 2 



N , N N 



3=1 k=l 1=1 

< *i(t)* 2 (t) (6) 
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with *i(t) = ! x(0)l ); (7) 

[mm w k (t)\ 2 

k 

i=l i=l 

Notice that the decomposition done in Eq. © mimics Eq. Q for the Squared Error. 

From now, our main contributions will be to understand the behavior of both terms ^i(t) and ^(t) 
when t is large. In Section UlI-BI we will prove that ^i(t) can be upper bounded infinitely often. The 
term ^(t) represents the projection of the current sensor values on the orthogonal space to the consensus 
line. The analysis of this term is drawn in Section IIII-CI 



B. Analysis of ^\{t) 

This term depends on the inverse of the minimum of the sensors weights (see Eq. (O) and thus can 
increase quickly. However, the sensors frequently exchange information and hence spread their weight so 
the probability that a node weight keeps decreasing for a long time is very small. We will work on this 
probability and show that it can be made as small as one wants considering a sufficiently long amount 
of time. This will enable us to prove that ^i(t) will be infinitely often lower than a finite constant. To 
obtain these results, some preliminary lemmas are needed. 

First, we will focus on the behavior of the nodes weights and especially on their minimum. One can 
remark that at every time t there is as least one node whose weight is greater than or equal to 1 (as 
the weights are non-negative and Vt > 0, Yli Wi(t) = N because of the mass conservation exhibited in 
Eq. ©). As w(i + t) T = w(t) T P(t ,to + t) where P(i ,*o + t) = K(t )-K(t + t), it is interesting 
to focus on i) the minimum non-null value of P(toj to + 1) an d h) on the instants where P(to, to + 1) is 
positive. 

Lemma 2. For all t,to > 0, all the non-null coefficients of P (to, to + t) are greater than or equal to 
(mfcf. 

Proof: Let us recall that mjc is the smallest non-null entry of all the matrices belonging to the set K, 
as denned in Eq. (0]). Let us consider the random matrix P(t) (as the matrix choice is i.i.d., we drop the 
offset to). We will then prove this result by induction. It is trivial to see that every non-null coefficient 
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of P(l) = K(l) is greater than m/c and as 

( p (*)) i)i = E(P(*-l))i 1 *(K(t)) fcj -, 

k=l 

it is obvious that if (P(t)) i ■ > 0, then there is a term in the sum that is positive (we remind that all the 
coefficient here are non-negative). This term is the product of a positive coefficient of P(t — 1) and a 
positive coefficient of K(i). Hence, if all the non-null coefficients of P(t — 1) are greater than (m^)'" 1 , 
then any non-null coefficient of P(t) is greater than (mfcY" 1 .m/c = {mjcf. So, by induction, we have 
that Vi > every non-null coefficient of P(i) is greater than (mjc)*. ■ 
Thanks to Item (B2) of LemmaQ] there is a finite L such that there exists a realization of P(L) which is 
a positive matrix. Considering the time at multiples of L, we know that for any n, if P(nL+l, (n+l)L) > 
then for all i, Wi((n + 1)L) > mfc. Let us define the following stopping times: 

( T = 

| r n = L x min | j : Y^l=i 1 {P(fci+i,(fc+i)i)>o} = n ] 
where lg is the indicator function of event E. And, 

A„ = T n - T n _i n = l,...,oo. 

The l{p(fcL+i.(fc+i)L)>o} are i-i-d- Bernoulli random variables with strictly positive parameter p. Thus 
the inter-arrival times A n are i.i.d. and geometrically distributed i.e. P[Ai = k] = p k ~ 1 (l —p) for k > 1. 
Observe that the (r n ) n> o are all finite and converge to infinity with probability one. We then have proven 
the following result: 

Proposition 1. Under Assumptions (Al), (A2), and (B), there exists a sequence of positive i.i.d. geomet- 
rically distributed random variables (A n ) n> o such that for all n > 

VPi(t„) < ||x(0)||2( m/c )- 2i 

where r n = YX=\ A ^ 
C. Analysis 0/^2^) 

This section deals with new results about ^(i)- These results extend dramatically those given in |[T2l 
since we consider more general models for K(i) and any type of connected graph. According to Eq. (J8]), 
we have 

* 2 (t) = ||(I-J)P(t)||| (9) 
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where || • \\p denotes the Frobenius matrix norm. 

One technique (used in e.g. |Q ) consists in writing E[* 2 (i)] = Trace ((I - J)E [P(t)P T (t)] (I - J)) 
thanks to Eq. (© and finding a linear recursion between E[^ 2 (t)|^ 2 (^ — 1)] an d — 1)- However this 
technique does not work in the most general case Q. 

Therefore, as proposed alternatively in [4] (though not essential in |4|) in the context of Random- 
Gossip Algorithms (see Section Hl-B lb . we write ^(i) with respect to a more complicated matrix for 
which the recursion property is easier to analyze. Indeed, recalling that for any matrix M, 

||M||| = Trace (MM T ) 
and Trace (M <g> M) = (Trace (M)) 2 

one can find that 

* 2 (t) = \\E(t)\\ F 

with 

3(t) = (I — J) P(£) ® (I — J) P(i). 

By remarking that (I — J) P(i) (I — J) = (I — J) P(t), and by using standard properties on the Kronecker 
product, we have 

3(i) = (I — J) P(i — 1) (I — J) K(i) (g) (I — J) P(i — 1) (I — J) K(i) 

= H(t-l)[((I-J)®(I-J))(K(t)®K(t))]. (10) 

By considering the mathematical expectation given the natural filtration of the past events Tt-\ = 
a (K(l), • • • , K(t - 1)), we obtain 

E [3(*)|^t-i] = E(t - 1) ((I - J) ® (I - J)) .E [K ® K] . 

As 3(0) = (I - J) ® (I - J) and ((I — J) (8) (I — J)) 2 = (I — J) <g) (I — J), we finally have 

E[S(t)]=R*. (11) 

'We have E[* 2 (t)|* 2 (t - 1)] = Trace ((I - J)P(t - 1) (I- J)E [KK T ] (I — J) P(t — 1)(I — J)). By introducing the 
matrix M = (I - J) E [KK T ] (I - J), it is easy to link E[* 2 (t)|*2(* - 1)] with <Sf 2 (t - 1) since E[* 2 (t)|^ 2 (t - 1)] < 
||M|| sp $ 2 (t — 1) where || • || sp is the spectral norm (see 1151 Chap. 7.7] for details). Unfortunately, in some cases, ||M|| sp 
can be greater than 1; indeed for the BWGossip algorithm (introduced in Section HV-At . one can have ||M|| sp > 1 for some 
underlying graphs. Nevertheless, this BWGossip algorithm converges as we will see later. As a consequence, the inequality 
E[^ , 2(£)| 1 I' 2 (£ — 1)] < ||M|| S p^ r 2 (t) is not tight enough to prove a general convergence result and another way has to be found. 
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with 

R = ((I- J) <g> (I - J)) .E [K <g) K] . (12) 

Now one can find a simple relationship between E[^2(i)] and the entries of the matrix E[H (t)] by 
considering Q(i) = (I — J) P(t) and (Q(t))j ^ = Qij(t). After simple algebraic manipulations, we show 
that 

(E[H(t)]) i+(fe _ 1)JVji+(z _ 1)iV = E[ % -(t) % (t)], V(*,i,M) £{!,••• ,^} 4 - 
According to Eq. ©, we have E[* 2 (*)] = E[||Q(i)||^] which implies that 

N N 

1[*2(*)] = E 1 ^)] = E (E[S(t)]) i+(j - lW+ 0-l)7V 
ij=l i,j=l 

As a consequence, the behavior of the entries of E[E (t)] drives the behavior of E[\p2(i)]- 

Let us define the vector norm on N x N matrices as III Mill = N max |m«|. The norm III* III is 
a matrix norm (see |fT5l Chap. 5.6]) and hence is submultiplicative. Now, using the Jordan normal form 
of R (see lfT31 Chap. 3.1 and 3.2]), we get that there is an invertible matrix S such that 

IIIr'HI = IIIsa^- 1 !!! < |s|i His- 1 !! II|a*||| (13) 

111^ IM^-i — III III no Ml 1 1 1 oo III 1 1 1 oo 



where A is the Jordan matrix associated with R. 

After some computations, it is easy to see that the absolute value of all the entries of A* are bounded 
in the following way: 



max kA*);,-! < max f * )p(R)* j 

<i,j<N ]K ' y| - 0<i<J-lVt - jj 



l<i,j<N 

with p(R) the spectral radius of R and J the maximum size of the associated Jordan blocks. Hence, 

Vt > 



max |(A*) i7 -| < t J -V(R-)*~ J+1 (14) 

l<i,j<N 1 1 

When R is diagonalizable, J = 1, and we get that 



max (A*)jj < p(R) (when R is diagonalizable) (15) 

l<i,j<N 

Putting together Eqs. (fTTT ). (fT3T ). (fT4l) . (fT5l) . and remarking that the subspace spanned by ljy2 = 1 ® 1 
is in the kernel of R, we get that the size of the greatest Jordan block is < JV — 1, hence the following 
lemma: 

Lemma 3. We have 

E[$ 2 (f)] = 0(^V(R)') 
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where R is defined in Eq. ([72]) and where p(R) is the spectral radius of the matrix R. 

The next step of our analysis is to prove that the spectral radius p(R) is strictly less than 1 when 
Assumptions (Al), (A2), and (B) hold. Applying Theorem 5.6. 12 of 03 on Eq. £0} 

proves that p (R) < 1 

if and only if E [5(f)] converges to zero as t goes to infinity. Therefore our next objective is to prove 
that E [H(i)] converges to zero by using another way than the study of the spectral radius of R. 

Actually, one can find another linear recursion on S(i) (different from the one exhibited in Eq. (flOl)). 
We get 

H(t) = H(t-l).(K(t)®K(<)) 
and, by taking the mathematical expectation given the past, we obtain 

E[E(t)|Ji-i] = E(t- 1).E [K ® K] . 
Remarking that H(£)1jv2 = 0, we have for any vector v, 

E [E(t)\T t -i] = E(t - 1). (E [K ® K] - l^v T ) 

and then, for any vector v, 

E[B(t)]=S(0)St (16) 

with S v = E [K ® K] - 1tv2V T and 5(0) = (I - J) ® (I - J). 

By considering Eq. (fT6l ). it is straightforward to see that E [S(t)] converges to zero as t goes to infinity 
if there is a vector v such that p(S v ) < 1. Notice that the recursion given in Eq. (fT6l ) is less "strong" 
than the one in Eq. (TTTb since it only leads to a sufficient condition instead of a necessary and sufficient 
condition. As p(S v ) < 1 implies the convergence of E [5(i)] and as the convergence of E [H(£)] implies 
that p(R) < 1, one thus can state the following Lemma: 

Lemma 4. If there is a vector v such that p (E [K <g) K] — 1tv2V T ) < 1, then p(R) < 1 . 

One of the most important result in the paper lies in the following Lemma in which we ensure that, 
under Assumptions (Al), (A2), and (B) there is a vector v such that p (E [K ® K] — 1tv 2 v t ) < 1 and 
thus p(R) < 1. 

Lemma 5. If Assumptions (Al), (A2), (B) hold, there is a vector v such that p HE [K ® K] — 1jv2v t ) < 
1. 

Proof: Assumptions (Al), (A2), and (B) imply that 
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i) E[K <8> K] is a non-negative matrix with a constant row sum equal to one (because of the row- 
stochasticity). According to Lemma 8.1.21 in fi31l . we have p(E[K ® K]) = 1. 

ii) E[Kg)K] is a primitive matrix (see (B3) in Lemma [TJ which implies that there only is one eigenvalue 
of maximum modulus. This eigenvalue is thus equal to 1 and associated with the eigenvector In 2 - 

By using the Jordan normal form and the simple multiplicity of the maximum eigenvalue (equal to 1), 
we know that i) there exists a vector vi equal to the left eigenvector corresponding to the eigenvalue 1, 
and ii) that the set of the eigenvalues of E [K ® K] — Ia^v^ = S Vl are exactly the set of the eigenvalues 
of E [K (g) K] without the maximum one equal to 1. Indeed the maximum eigenvalue of E [K <g> K] has 
been removed by the vector 1n 2 vJ and the associated eigenvector now belongs to the kernel of S Vl - As 
a consequence, the modulus of the eigenvalues of S Vl is strictly less than 1, i.e., p(S Vl ) < 1. ■ 

Aggregating successively the results provided in Lemmas [5j |U and [3j leads to the main result of this 
Section devoted to the analysis of \E f 2 (i)- Indeed, Lemma [5j ensures that there is a vector v such that 
p(S v ) < 1, then Lemma |4] states that p(R) < 1. Then, Lemma [3j concludes the proof for the next result. 

Proposition 2. Under Assumptions (Al), (A2) and (B) holds, then 



with k = — log (p (R)) > 0. 
D. Final results 

Thanks to the various intermediate Lemmas and Propositions provided above, we are now able to 
state the main Theorems of the paper. The first one deals with the determination of the necessary and 
sufficient conditions for Sum-Weight-like algorithms to converge. The second one gives us an insight on 
the decrease speed of the Squared Error (defined in Eq. ©). In the meanwhile, we need the following 
lemma: 

Lemma 6. ||x(i) — x a t;el||oo = rnaxj \ xi(t) — x ave \ is a non-increasing sequence with respect to t. 
Proof: One can remark that, at time t + 1, we have 



E[* 2 (i)] = 0(t N - 2 e~ Kt ) 



Vi, Xj(t + 1) 



Eti(K)^(t) 
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where K corresponds to any matrix in /C. So Xj(t + 1) is a center of mass of (xi(t))i=i t „. t N- Therefore, 
Wje{l,...,N}, 

\xj(t + 1) — x ave \ < I N —2— — J \Xi(t) — x ave \ 

i=i \Ei=i(^)ejWi(t) J 

< max — x ave | . 

i 

■ 

1) Result on the convergence: Let us consider that Assumption (B) does not hold. Thanks to (Bl) 
in Lemma [T] this is equivalent to 3(k,l) £ N 2 such that VT, P(T)kj = 0. Let us take x(0) equal 
to the canonical vector composed by a 1 at the k-th position and elsewhere. Then for any t > 0, 
xi(t) = which is different from x ave = 1/N. Consequently, the algorithm does not converge to the 
true consensus for any initial measurement. So if the Sum- Weight algorithm converges almost surely to 
the true consensus for any initial vector x(0) then Assumption (B) holds. 

Let us now assume that Assumption (B) holds. Using Markov's inequality along with Result |2j we 
have a finite K such that for any S > 0, 

t>0 t>0 

< l K y t N - 2 e- Kt <oo. 

Consequently, Borel-Cantelli's Lemma leads to the almost sure convergence of ^(i) to zero. In 
addition, the random variables {r n ) n> Q provided in the statement of Proposition [TJ converge to infinity 
with probability one, hence ^2{j n ) ~^ almost surely. Since ^i(r n ) is bounded, ^ 'i(r n )^ '2(^1) — > 

n— >oo 

almost surely. According to Lemma[6l ||x(t)— a; owe l||oo i s a nonincreasing nonnegative sequence verifying 
||x(i) — x^elHoo < ^i^)^^)' as there is converging subsequence with limit 0, the sequence itself 
converges to the same limit which implies the following theorem. 

Theorem 1. Under Assumptions (Al) and (A2), x(i) converges almost surely to the average consensus 
x ave l for any x(0), if and only if Assumption (B) holds. 

We have additional result on another type of convergence for x(i). As \\x(t) — ^ a j;el||oo is a non- 
increasing sequence, we have, for any t, ||x(t) — x oue l||oo < l| x (0) — £<reel||oo which implies that x.(t) 
is bounded for any t > 0. As a consequence, according to |[T6l . since x(t) also converges almost surely 
to x ave l, we know that x(t) converges to x ave l in L p for any positive integer p. The convergence of 
the mean squared error of x(i) thus corresponds to the case p = 2. 
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Corollary 1. If x(t) converges almost surely to the average consensus x ave l then the mean squared 
error (MSE) converges to zero. 

2) Result on the convergence speed: The next result on the convergence speed corresponds to the 
main challenge and novelty of the paper. Except in |[T2l for a very specific case (cf. Section IV-AI for 
more details), our paper provides the first general results about the theoretical convergence speed for the 
squared error of the Sum- Weight like algorithms. For the sake of this theorem we introduce the following 
notation: given two sequences of random variables (X n ) n> o and (Y n ) n> o, we will say that X n = o a s .(y n ) 
if X n /Y n — > almost surely. 

Theorem 2. Under Assumptions (Al), (A2), and (B), the squared error fSEJ is non-increasing. Further- 
more, it is bounded by an exponentially decreasing function as follows 

SE(r n ) = ||x(r re ) - x ave l\\l = o a . s . (r^e- Kr ") 

with k = — log (p (((I — J) (g) (I — J)) E [K (8) K])) > and r n = 2~27=i ^ as defined in Proposition [7] 

This result tells us that the slope of log(SE(t)) is lower-bounded by k infinitely often which provides us 
a good insight about the asymptotic behavior of x(t). Indeed, the squared error will vanish exponentially 
and we have derived a lower bound for this speed. We believe this result is new as it may foretell any 
algorithm speed. The particular behavior of the weights variables in this very general setting does not 
enable us to provide a clearer result about the mean squared error; however for some particular algorithms 
(e.g. single-variate ones) this derivation is possible (see Section [V] for more details). The authors would 
like to draw the reader's attention to the fact that the main contribution of the paper lies in the exponential 
decrease constant k. 

Proof: To prove this result we will once more use the decomposition of the squared error introduced 
in Eq. ©. We know from Proposition [2] that E[t~ N e Kt ^ 2 (t)] = 0{t~ 2 ). By Markov's inequality and 
Borel-Cantelli's lemma, 

t~ N e Kt ^ 2 (t) > almost surely. 

t— loo 

Composing with the (r n ) n> o, we get 

r~ 7V e Kr "^' 2 (r n ) > almost surely. 

n— >oo 

Since 3C, Vn > 0, l I / i(r n ) < C, we get the claimed result. ■ 
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IV. Proposed algorithms 

In Subsection IIV-AI we propose a new Sum-Weight-like algorithm using the broadcast nature of 
the wireless channel which converges and offers remarkable performance. This algorithm is hereafter 
called Broadcast-Weighted Gossip (BWGossip). In Subsection IIV-B1 a new distributed management of 
the nodes' clocks which can improve averaging algorithms is proposed. Finally, Subsection IIV-C I provides 
an extension of this work to the distributed sum computation. 



A. BWGossip algorithm 

Remarking i) that the broadcast nature of the wireless channel was often not taken into account in 
the distributed estimation algorithms (apart in (9l but this algorithm does not converge to the average) 
and ii) that information propagation is much faster while broadcasting compared to pairwise exchanges 
ifTTl . we propose an algorithm taking into account the broadcast nature of the wireless channel. At each 
global clock tick, it simply consists in uniformly choosing a sensor that broadcasts its pair of values in 
an appropriate way; then, the receiving sensors add their received pair of values to their current one. A 
more algorithmic formulation is presented below. 



Algorithm 1 BWGossip 



When the sensor i wakes up (at global time t): 



► The sensor i broadcasts 



Sj(t) Wj(t) 



JA/il+i' pv7|+i, 
The sensors of the neighborhood Mi update : Vj G Mi , 



Sj(t + 1) = Sj(t) + 



Wj(t + 1) =Wj(t) + jj^ 



+1 



► The sensor i updates : < 



, Si® 

pvTj+T 



si{t + r 
{ ^ + 1 ) = wfe 



According to this formulation, the update matrix Kj associated with the action of the z-th sensor takes 
the following form 

Ki = I- eie T + eie T[(i + D )-i(A + I)] 

= I-aeJil + B^L (17) 

with ei the i-th canonical vector. Clearly, the update matrices satisfy the Assumptions (Al) and (A2). 
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Thanks to Eq. (fTTT ) and recalling that L = D — A, we obtain that 

E[K] = I-^I + D)- 1 ! 

= ^i+a+Dr^i+A). 

As all the involved matrices are non-negative, we have (I + D) _1 (I + A) > (1 + A) / {{d max + l)iV). 
As a consequence, we have 

E[Kla (5^W (I + A)aa 

Since A is the adjacency matrix of a connected graph, 3m > 0, (I + A) m > 0. Hence, for the same 
m, E[K] m > l/(d max N + N) m (l + A) m > 0, which implies that E[K] is a primitive matrix. Applying 
Lemma Q] enables us to prove that Assumption (B) also holds. 

Hence, Theorem [J states that the BWGossip algorithm converges almost surely to the average consensus 
and Theorem [2] gives us an insight about the decrease speed of the squared error. 

B. Adaptation to smart clock management 

So far, all the Poisson coefficients of the clocks were identical. This means that all sensors were waking 
up uniformly and independently from their past actions. Intuitively, it would be more logical that a sensor 
talking a lot became less active during a long period. 

Another advantage of the Sum-Weight algorithms is the knowledge of how much a sensor talks 
compared to the others which is a useful information. Actually, each sensor knows whether it talks 
frequently or not (without additional cost) through its own weight value because when a sensor talks, its 
weight decreases and conversely when it receives information, its weight increases. Therefore, our idea 
is to control the Poisson coefficient of each sensor with respect to its weight. 

We thus propose to consider the following rule for each Poisson coefficient 

Vi£V, Xi(t) = a + (1 - a)wi(t) (18) 

where a E (0, 1) is a tuning coefficient. 

Notice that the global clock remains unchanged since Vt > 0, E*=i A iW = N - Keeping the global 
message exchange rate unchanged, the clock rates of each sensor are improved. The complexity of the 
algorithm is the same because the sensor whose weight changes has just to launch a Poisson clock. 

Even if the convergence and the convergence speed with clock improvement have not been formally 
established, our simulations with the BWGossip algorithm (see Fig. [2]) show that it seems to also converge 
exponentially to the average more quickly if a is well chosen. 
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C. Distributed estimation of the sum 

In some cases, distributively computing the sum of the initial values is very interesting. For example, 
in the case of signal detection, the Log Likelihood Ratio (LLR) of a set of sensors is separable into 
the sum of the LLRs of the sensors. Hence, in order to perform a signal detection test based on the 
information of the whole network (using a Generalized LLR Test for instance), every sensor needs to 
estimate the sum of the LLRs computed by the sensors. 

An estimate of the sum can be trivially obtained by multiplying the average estimate by the number 
of sensors which might not be available at any sensor. Another interest of the Sum-Weight scheme is 
that the initialization of the weights of the sensors enables us to compute different functions related to 
the average. Intuitively, as the sum of the s(t) and w(i) vectors are conserved through time and the 
convergence to a consensus is guaranteed by the assumptions on the update matrices, we get that the 
sensors will converge to J2i s «(0)/ Yli w i(ty- This is obviously equal to the average l/iVJ^z^O) with 
the initialisation of Eq. £T|). 

Now, if a sensor wants to trigger a estimation of the sum through the network, it simply sets its weight 
to 1 and sends a starting signal to the other nodes which set their weights to 0. Mathematically, we then 
have the following initialization after sensor i triggers the algorithm 

s(0) = x(0) 

w(o) = a 

where e$ is the i-th canonical vector. In this setting, all Sum-Weight like algorithms converge exponentially 
to the sum of the initial value as all the theorems of the paper hold with only minor modifications in the 
proofs. 

V. Comparison with existing works 

In this section, we will show that our results extend the works done previously in the literature. In 
Subsection IV-AI and IV-B1 we compare our results with existing papers dealing with the design and the 
analysis of the Sum-Weight like algorithms. In Subsection IV-CI we will observe that our results can even 
be applied to the traditional framework of single-variate gossip algorithms. 

A. Comparison with Kempe's work 

In the Kempe's work lfl2ll . the setup is quite different since the sensors' updates are synchronous, that 
is, at each time t, all the sensors send and update their values. Another important difference lies in the 
fact that the communication graph is assumed to be complete and to offer self-loops, i.e., each sensor 
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can communicate with any other one, including itself. The algorithm introduced in lfT2l is described in 
Algorithm El 

Algorithm 2 Push-Sum Algorithm [12 ] 
At each time t, every sensor i activates: 

► The sensor i chooses uniformly a node ji(t) belonging to its neighborhood (including itself) 

► The sensor i sends the pair (si(t)/2;wi(t)/2) to ji(t) 

► Let 1Z be the set of sensors that sent information to i. The sensor i updates: 

(t)/2 

Wi {t + 1) = Wi(t)/2 + J2ren w r {t)/2 



Consequently, at time t, the update matrix takes the following form 

1 1 N 

K W = 2 I+ 2^ e ^W (19) 

i=i 

where the index ji(t) is defined in Algorithmic Notice that the first term of the right hand side corresponds 
to the information kept by the sensor, while the second term corresponds to the information sent to the 
chosen sensor. Moreover, as each sensor selects uniformly its neighbo^ (including itself), we obtain that 

E[K] = -I + -J. 

L J 2 2 

It is then easy to check that 

- the (instantaneous) update matrices are non-negative and row-stochastic. In addition, they are chosen 
uniformly in a set of size N N . 

- the (instantaneous) update matrices have a strictly positive diagonal. 

- E[K] > 0, thus E[K] is a primitive matrix. 

This proves that the Kempe's algorithm satisfies the assumptions (Al), (A2) and (B), and so it converges 
almost surely to the average consensus (which was also proven in lfT2l ). 

Let us now focus on the convergence speed of the Kempe's algorithm. We remind that the convergence 
speed is driven by ^(i) (denoted by &t hi lfT2l ). As this algorithm is synchronous and only applies 
on a complete communication graph, it is simple to obtain a recursion between Ef^WI^C* ~~ 1)] an d 
\T/2 — 1)- Indeed, the approach given in the footnote of Section UlI-CI can be applied. More precisely, 

2 as the graph is complete, this means, choosing one node uniformly in the graph. 



DRAFT 



SUBMITTED FOR PUBLICATION TO IEEE TRANSACTIONS ON SIGNAL PROCESSING, SEPTEMBER 2012 



19 



the corresponding matrix M = (I — J)E[KK T ] (I — J) is given in closed-form as (see Appendix IB-AI 
for details) 

M = (I - J) E[KK T ] (I - J) = Q - -L) (I - J) , (20) 

and then one can easily check^ that 

E[* 2 (t)|* 3 (t - 1)] = Q - -^j <f 2 (t - 1). (21) 

Moreover, thanks to Eq. (|20]>, we have that p(M) = (1/2 - l/(4N)) < 1 and thus the inequality in the 
above-mentioned footnote has been replaced with an equality and the spectral radius of M is less than 
1. Therefore, the true convergence speed is provided by p(M). Comparing this previous convergence 
speed (obtained very easily in |[T2l ) with the convergence speed bounds obtained in our paper is of great 
interest and will be done below. First of all we remind (see the footnote in Section IIII-CI ) that in the 
general case treated in our paper, it is impossible to find a recursion similar to Eq. (1271 ) which justifies 
our alternative approach. Secondly, following the general alternative approach developed in this paper, 
we know that the matrix of interest is R = ((I — J) (g) (I — J)) .E [K <g) K] (see Proposition |2]). After 
some computations (a detailed proof is available in Appendix IB-BK we have that 

R = l(I_j)^ ( I_ J) + ^_i vv T (22) 

with v = (l/y/W^l) (u - {1/N)lw) and u = YjU e * ® e *- 

Consequently, R is a linear combination of two following orthogonal projections: 

• the first projection, generated by (I — J) <g> (I — J), is of rank A^ 2 — 2N + 1, 

• the second projection, generated by vv T , is of rank 1. 

As (I — J) <g> (I — J) and vv T are orthogonal projections, the vector space II N ~ (on which the matrix 
R is operating) can be decomposed into a direct sum of four subspaces: 

. S = Im(vv T ) n Ker ((I - J) ® (I - J)) 
. S\ = Im(vv T ) n Im ((I — J) ® (I — J)) 
. 5 2 = /Cer(vv T )nXm((I- J) ® (I - J)) 
. S 3 = /Cer(vv T ) n Ker ((I — J) ® (I — J)) 

As ((I — J) ® (I — J)) v = v (see Appendix IB-BK we have S = {0}. 

3 Note that there is a typo in Lemma 2.3 of (T2). Indeed, the coefficient is (1/2 — 1/(2JV)) in |[12| instead of (1/2 - 1/(4JV)). 
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Moreover, according to Eq. (l22l ). we obtain that 



1 

2 




Vx G Si 



Rx= | x 



Vx G S 2 







Vx G 5 3 



As a consequence, the non-null eigenvalues of R are 1/4 and (1/2 — l/(4/V)) which implies that 
p (R) = 1/2 — l/(4iV). Hence, the convergence speed bound obtained by our general alternative approach 
developed in this paper provides the true convergence speed for the Kempe's algorithm lfT2l . 

B. Comparison with Benezit's algorithm 

In Q, it has been shown that doing a multi-hop communication between sensors provides significant 
performance gain. However, the proposed algorithm relied on a single-variate algorithm. In order to 
ensure the convergence of this algorithm, the double-stochasticity of the matrix update is necessary which 
implies a feedback along the route. The feedback can suffer from link failure (due to high mobility in 
wireless networks). To counter-act this issue, Benezit proposes to get rid of the feedback by using the 
Sum- Weight approach |fl~3l . In this paper, the authors established a general convergence theorem close 
to ours. In contrast, they did not provide any result about convergence speed. It is worth noting that our 
convergence speed results can apply to the Benezit's algorithm. 

C. Comparison with the single-variate algorithms 
If the following additional assumption holds, 
(A3) The matrices of K, are column-stochastic, 

one can easily show that all the weights w(i) remain constant and equal to 1, i.e., 



Therefore, the single-variate algorithms (HH) with double-stochastic update matrices such as the 
Random Gossip 0, 0], the Geographic Gossip [6] can surprisingly be cast into the Sum-Weight 
framework. Moreover as ^i(t) = ||x(0)||2 because all the weights stay equal to 1, the proposed results 
about ^2(t) (that is Section [Tll-Cb can be applied directly to the squared error for these algorithms. 

Let us re-interpret the work of Boyd et al. (H (especially their section 2) in the light of our results. 
In (31, it is stated that under doubly-stochastic update matrices K(i), the mean squared error at time t is 



Vt > 



and 



w(i) T = w(0) T P(t) = l T P(t) = 1 
x(t) = s(t) = K(t) T x(i- 1). 



T 
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dominated by p (E[KK T ] — (l/A r )ll T )* and converges to when t goes to infinity if 

p(E[K]-lll T ) <1. (23) 

Since K(t) is doubly-stochastic, one can remark that (I — J) E [KK T ] (I — J) = E [KK T ] - (l/iV)ll T . 
By following the approach developed in the footnote of Section llll-CI we obtained directly the domination 
proven in (4). Moreover, the condition corresponding to Eq. (l23l actually implies Assumption (B). Indeed, 
due to Eq. d23l and the double-stochasticity of K(i), one can remark that the maximum eigenvalue of 
E[K] is unique and equal to 1. Consequently, E[K] is primitive, and thus Assumption (B) holds (see 
LemmaQ]). Furthermore, in (see section II-B) , it is stated that the condition corresponding to Eq. (l23l 
is only a sufficient condition and that the necessary and sufficient condition is the following one 

p ^E[K ® K] - ll^l^ < 1 (24) 

which is exactly the same expression as that in Lemmas [4] and Qfl. Along with the reasoning detailed 
in Section IIII-D 1 1 these two lemmas prove that under assumptions (Al) and (A2), the condition corre- 
sponding to Eq. (1241 is eventually necessary and sufficient when assumption (A3) is also satisfied. 

Moreover, according to Eq. (19) (in H) and Eq. (fT6l ) (in our paper), we know that the mean squared 
error at time t is upper bounded by — n't with k' = — log(p (E[K £g> K] — (1/N)1n 2 Ijv 2 )) ^ ^- However, 
as stated in Proposition |2j the logarithm of the squared error scales with — nt. Though these two spectral 
radii are less 1 and so ensure the convergence, p ((I — J) (g> (I — J) .E [K £g> K]) (i.e. e~ K ) exhibited in 
our paper is in general smaller than p (E[K £3 K] — (l/N)lN2lJ^ 2 j (i.e. e~ K ) introduced in JH. Hence, 
thanks to our approach, a tighter convergence speed bound has been derived. Numerical illustrations 
related to this statement are displayed on Fig. [4] 

VI. Numerical results 

In order to investigate the performance of distributed averaging algorithms over Wireless Sensor 
Networks, the use of Random Geometric Graphs (RGG) is commonly advocated. These graphs consist 
in uniformly placing N points in the unit square (representing the vertices of the future graph) then 
connecting those which are closer than a predefined distance r. A choice of r of the form -y/ro \og(N)/N 
with ro G [1,..,10] ensures connectedness with high probability when N becomes large and avoids 
complete graphs (see |[T9l for more details). 

4 Indeed, as the vector v used in our formulation can be replaced with the left eigenvector corresponding to the eigenvalue 
1 (see the proof of Lemma [5] for more details) which is proportional to 1 here due to the double-stochasticity of the update 
matrices 
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In Fig. [TJ we plot the empirical mean squared error versus time for different gossip algorithms: i) the 
Random Gossip [JJ which is the reference algorithm in the literature; ii) the Broadcast Gossip introduced 
in Q which uses the broadcasting abilities of the wireless channel but does not converge to the average; 
iii) the algorithm introduced by Franceschelli in [TT) which uses a bivariate scheme and seems to converge 
(no convergence proof is provided in the paper); and iv) the proposed BWGossip algorithm. A Random 
Geometric Graphs with N = 100 sensors and tq = 4 has been considered. We remark that the BWGossip 
algorithm outperforms the existing algorithms without adding routing or any other kind of complexity. 

In Fig. [21 we plot the empirical mean squared error for the BWGossip algorithm versus time with 
different clock tuning coefficients (see IIV-BI and Eq. ( fT8l ) for more details). Compared to the algorithm 
without clock management (a = 1), the convergence is much faster at the beginning with a = but the 
asymptotic rate is lower; with a = 0.5, the performance is better than the BWGossip for any time. 

In Fig. [3j we display the empirical convergence slopej and the associated lower-bound k derived in 
Theorem [2] for the BWGossip algorithm versus the number of sensors N. Different Random Geometric 
Graphs with tq = 4 have been considered. We observe a very good agreement between the empirical 
slope and the proposed lower bound. Consequently, our bound is very tight. 

In Fig. @1 we display the empirical convergence slope, the associated lower-bound k, and the bound 
given in JH for the Random Gossip algorithm versus the number of sensors N. The proposed bound 
k fits much better than the one proposed in (4[ . Actually, the proposed bound matches very well the 
empirical slope (see Section IV-CI for more details). 

Thanks to Fig. [5j we inspect the influence of link failures in the underlying communication graph on 
the BWGossip algorithm. We consider a Random Geographic Graph with 10 sensors and ro = 1 onto 
which i.i.d. link failure events appear with probability p e . In Fig. [5al we plot the empirical mean squared 
error of the BWGossip versus time for different values of the edge failure probability p e . As expected, 
we observe that the higher p e the slower the convergence but the MSE still exponentially decreases. 
Then, in Fig. [5jj] we plot the empirical convergence slope and the associated bound k for different link 
failure probabilities. Here, k is computed according to a modified matrix set taking into account the link 
failures through different update matrices. We remark a very good fitting between our lower bound and 
the simulated results. Consequently, computing k on the matrix set including the link failures enables us 
to predict very well the convergence speed in this context. 

5 this slope has been obtained by linear regression on the logarithm of the empirical mean squared error. This regression makes 
sense since, for inspected algorithms, the mean squared error in log scale is almost linear for t large enough as seen in Fig. Q] 
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VII. Conclusion 

In this paper, we have analyzed the convergence of the Sum-Weight-like algorithms (relying on two 
variables rather than one) for distributed averaging in a Wireless Sensor Network. We especially give 
a very precise insight on the convergence speed of the squared error for such algorithms. In addition, 
we proposed a particular Sum-Weight-like algorithm taking full advantage of the broadcast nature of the 
wireless channel. We observed that this algorithm significantly outperforms the existing ones. 

Appendix A 
Proof of Lemma Q] 

(B) =>- (Bl) Let denote by K'"' 1 ') a matrix of K, whose (u, v)-th coefficient is positive. As the graph 
associated with E [K] is connected, then for all couples of nodes (i, j), there is a path of finite length < 
N from i to j: (i = m, ..,u Lij = j). Consequently, the matrix K^' = K ( - Ul ^K ( - u ^ U3 \.K^ L ^- 1 ' UL ^ ) 
verifies: (K? - ^)^- > which gives us a realization of P(Ljj) verifying (P(Ly))j j > 0. 
(Bl) => (B2) Let us take L = ^ij < 2N 2 . Since each matrix has a positive diagonal according 

to Assumption (A2) then nfj=i K 4-5 '-? is a possible realization of P(L) of strictly positive probability 
which is a positive matrix. 

(B2) => (B3) If there is a L < 2N 2 and a realization p of P(L) so that P[P(L) = p] > and p > 0, 
then p <g) p is also positive. Since (A <g> B).(C <g) D) = (AC) (8) (BD) for any matrices A, B, C, D with 
the appropriate dimensions, 

(E[K®K]) L = (j2 Pi Ki®K^J > P[P(L) = p].p ® p > 0. 
Hence, E[K ® K] is a primitive matrix. 

(B3) (B) First, we will calculate E [K] (g> E [K] with respect to E [K <g> K]. So, 

M M 

E[K]®E[K] = ^y^piPj-Ki^Kj 

i=i j=i 

M 

> J2p 2 Ki®Ki 

i=i 

M 

> (min pj).y^piKi ® Kj = (min p,-).E [K ® K] 

4=1 

Hence as it exists k such that (E [K ® K]) fc > 0, then (E [K]) fe > so the primitivity of E [K] is proven. 
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Appendix B 
Derivations related to Section [V] 

A. Derivations for Eq. d20| ) 



According to Eq. ( |T9l ), we have easily that 

TV TV TV TV 

K(t)K(t) T = + J E e ' e Iw + 1 E ^w e " + 1 E E e » e Iw^ 



i=l i=l i=l i'=l 



By remarking that e^e^ = 1, we have 

11* 1 N x N N 

K(t)K(t) T = ^ + I E e ' e I(t) + i E e ;.W e . T + I E E ^Iw^, 

i=l i=l i=l i'=l 

•V- 

The randomness in K(i)K(i) T is only due to the choice of the nodes ji(t) for i = {l,--- ,N}. 
Therefore, each ji(t) will be modeled by a random variable (independent of t). The random variables 
{£(i)}i=l,~. ,iv are i.i.d. and are uniformly distributed over {1, • • • , N}. As a consequence, we obtain 
1 1 N ( 1 * \ 1 N ( 1 N \ i * * / 1 JV 

i=l \ fc=l / i=i \ fc=l / j=l i'=l \ k,k'=l 

By remarking that e£ek> = as soon as k / fc', we have X)fc fc'=i e fc e &' = Furthermore, 

TV TV TV 

as E] e fc = l and E E eie ^ = llT ~ 1 

fc=l i=l i'=l 

we obtain E[KK T | = | - - — ) I + - J 
1 J V2 4vV/ 4 



It is then straightforward to obtain Eq. (1201 ). 



B. Derivations for Eq. A22\) 

Once again, according to Eq. (fl~9l ). we have easily that 



e ' ;e J(*) 



1 1 / N \ I ( N 

K(t)®K(i) = ( E eie J(t) I ^^i 1 ® I E 1 

i(E^ e *(*)J®(E^ e £(*)J (25) 



+ 4 

\i=l / 

Using the same technique as in Appendix IB-AI , we obtain that 



E 



TV 

" „ „T 

(*) 



£ e 4 



8=1 



TV / , TV 



£ e * ^£ e * = J C2Q 



> N 

i=i \ fc=i 
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Thus, it just remains to evaluate E[£]. Let us first remark that 

N TV N 

„ „T ^ „ 



? = EE e * e £m ei ' e I>(t) + X eie J(*) ei6 J 



i=l i'=l i=l 

As a consequence, we have 

^ N N N N TV TV 

E ^ = 121212 12 eie k ® e i' e l + ^1212^®^ 
i=i i'=i k=i k'=i i=i fc=i 



^ N N N N ^ N N . TV AT TV 

^ X X X) X e ^ ei,e ^ + jv X X e * e fc e * e £ - ^ X X X e * e * ^ 

i=l i'=l fc=l fc'=l 1=1 fc=l i=l fc=l fc'=l 



Using the well-known result on Kronecker product ( (AB)<g> (CD) = (A <g) C) (B (g) D) for four matrices 
A, B, C, and D with appropriate sizes), we have 

E[£] = J ® J + ^uu T - ^ ul^ 2 . (27) 

Putting Eqs. (|26ll -(l27ll into Eq. d2Sb . we get 

E[K0K] = ^l0l+ij(g)I + il®J + ij(g)J + — uu T - — ^ul^ . 

4 4 4 4 4iV 4A^ 2 



Before going further, let us remark that 

TV 

((I — J) <g) (I — J)) u = J]( ei - ±H T e 4 ) ® ( e , - i-ll T ei 



1 -r , , 1 



N l ' v * JV 
i=i 

TV TV TV TV 

^ ei ® ei -^(e 4 ®-l)-^(-l^ ej ) + ^^l®l 

i=l i=l i=l i=l 

u-^Iat,. (28) 



As a consequence, we have 



R = ((I — J) ® (I — J)) .E [K <g> K] 

= I(I-J)<8,(I-J) + -Lfu-i-l JV2 V T -7772 (u-^lA-) 1 ^ 



4 v y v ; 4JV V N J AN 2 
\ (I - J) ® (I - J) + ^uu T - -^W - ^ulj, + -L 
i („ i 





1 


(- 






iV 




+ 



Let us remind v = -^^j (u — -^1m 2 )- Thanks to Eq. (1281 ). we have 



rp X ( rp 1 rp 1 rp 

vv =— uu -— l^u -— ul^+JlgJ 



which straightforwardly leads to Eq. (|22l) . 

In addition, note that using Eq. d28l) , we have ((I — J) <g) (I — J)) v = v. 
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Fig. 1: Mean squared error of the BWGossip and other famous algorithms versus time. 
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Fig. 2: Mean squared error of the BWGossip versus time for different clock management schemes. 
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Empirical convergence slope 
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Fig. 3: Empirical convergence slope of the BWGossip and the associated lower bound k. 



bound of Boyd et al. 
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Fig. 4: Empirical convergence slope of the Random Gossip, the associated lower bound k, and the bound 
given in |4l . 



DRAFT 



SUBMITTED FOR PUBLICATION TO IEEE TRANSACTIONS ON SIGNAL PROCESSING, SEPTEMBER 2012 



29 




100 200 300 400 500 600 

time 



(a) Mean squared error versus time for different link failure probabilities. 
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(b) Empirical convergence slope and the associated lower bound k versus link failure probabilities. 
Fig. 5: BWGossip analysis in the presence of link failures. 
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